Search CORE

10 research outputs found

Optimization of the Distributed I/O Subsystem of the k-Wave Project

Author: Vysocký Ondřej
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

Práce se zabývá řešením efektivního paralelního zápisu velkých objemů dat na souborovém systému Lustre. Cílový program je navržen pro projekt k-Wave simulující šíření akustických a ultrazvukových vln. Tato simulace pro svou výpočetní a datovou náročnost vyžaduje spouštění na superpočítači a implementaci pomocí knihoven pro paralelní zpracování (Open MPI) a pro uložení velkých objemů dat (HDF5). Výsledný program je implementován v jazyce C s využitím zmíněných knihoven. Správným nastavením souborového systému Lustre bylo dosaženo rychlosti 2,5 GB/s, jež odpovídá 5-ti násobnému zrychlení nativního zápisu, který byl následně pomocí techniky agregace dat zrychlen až na 3 GB/s, což naráží na teoretické limity diskového pole superpočítače Anselm.This thesis deals with an effective solution of parallel writing of variable amounts of data on the Lustre file system. The work will be used by the k-Wave project designed for time domain acoustic and ultrasound simulations. Since the simulation is computationally and data intensive, the project requires to be implemented with libraries for parallel computig (Open MPI) and large data processing (HDF5) and it must run on a supercomputer. The application is implemented in C and uses previously mentioned libraries. The proper settings of the Lustre file system leads to the peak write bandwith of 2.5 GB/s that corresponds to a speedup factor of 5 compared to the reference settings. The data aggregation improved the write bandwidth by a factor of 3 compared to a naive version. Here, the achieved I/O bandwidth for certain block sizes hits the limits of the Anselm I/O subsytem (3GB/s).

Digital library of Brno University of Technology

National Repository of Grey Literature

Optimization of the Distributed I/O Subsystem of the k-Wave Project

Author: Vysocký Ondřej
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2016
Field of study

Práce se zabývá řešením efektivního paralelního zápisu a čtení dat pro nástroj k-Wave, provádějící simulací šíření ultrazvuku. Tento nástroj je superpočítačovou aplikací, proto je spouštěn na souborovém systému Lustre a vyžaduje paralelní zpracování pomocí MPI a zápis ve formátu vhodném pro velké množství dat (HDF5). V rámci této práce byly navrženy metody efektivního způsobu zápisu dat dle potřeb k-Wave, pomocí kumulace dat a přerozdělování. Všechny metody zrychlily nativní zápis a vedly až k rychlosti zápisu 13,6GB/s. Popsané metody jsou použitelné pro všechny aplikace s distribuovanými daty a častým zápisem.This thesis deals with an effective solution of the parallel I/O of the k-Wave tool, which is designed for time domain acoustic and ultrasound simulations. k-Wave is a supercomputer application, it runs on a Lustre file system and it requires to be implemented with MPI and stores the data in suitable data format (HDF5). I designed three methods of optimization which fits k-Wave's needs. It uses accumulation and redistribution techniques. In comparison with the native write, every optimization method led to better write speed, up to 13.6GB/s. It is possible to use these methods to optimize every data distributed application with the write speed issue.

Digital library of Brno University of Technology

National Repository of Grey Literature

MERIC and RADAR generator: tools for energy evaluation and runtime tuning of HPC applications

Author: Beseda Martin
Kannan Venkatesh
Lysaght Michael
Vysocký Ondřej
Říha Lubomír
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This paper introduces two tools for manual energy evaluation and runtime tuning developed at IT4Innovations in the READEX project. The MERIC library can be used for manual instrumentation and analysis of any application from the energy and time consumption point of view. Besides tracing, MERIC can also change environment and hardware parameters during the application runtime, which leads to energy savings. MERIC stores large amounts of data, which are difficult to read by a human. The RADAR generator analyses the MERIC output files to find the best settings of evaluated parameters for each instrumented region. It generates a Open image in new window report and a MERIC configuration file for application production runs

Crossref

DSpace at VSB Technical University of Ostrava

Domain knowledge specification for energy tuning

Author: Bendifallah Zakaria
Beseda Martin
Bouizi Othman
Chowdhury Anamika
Gerndt Michael
Kumaraswamy Madhura
Locans Uldis
Vysocký Ondřej
Zapletal Jan
Říha Lubomír
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

To overcome the challenges of energy consumption of HPC systems, the European Union Horizon 2020 READEX (Runtime Exploitation of Application Dynamism for Energy-efficient Exascale computing) project uses an online auto-tuning approach to improve energy efficiency of HPC applications. The READEX methodology pre-computes optimal system configurations at design-time, such as the CPU frequency, for instances of program regions and switches at runtime to the configuration given in the tuning model when the region is executed. READEX goes beyond previous approaches by exploiting dynamic changes of a region's characteristics by leveraging region and characteristic specific system configurations. While the tool suite supports an automatic approach, specifying domain knowledge such as the structure and characteristics of the application and application tuning parameters can significantly help to create a more refined tuning model. This paper presents the means available for an application expert to provide domain knowledge and presents tuning results for some benchmarks.Web of Science316art. no. E465

DSpace at VSB Technical University of Ostrava

Domain Knowledge Specification for Energy Tuning

Author: Bendifallah Zakaria
Beseda Martin
Bouizi Othman
Chowdhury Anamika
Gerndt Michael
Kumaraswamy Madhura
Vysocký Ondřej
Zapletal Jan
Řı́ha Lubomı́r
Publication venue
Publication date
Field of study

The European Horizon 2020 project READEX is developing a tool suite for dynamic energy tuning of HPC applications. While the tool suite supports an automatic approach, domain knowledge can significantly help in the analysis and the runtime tuning phase. This paper presents the means available in READEX for the application expert to provide his expert knowledge to the tool suite

ZENODO

Optimization of the Distributed I/O Subsystem of the k-Wave Project

Author: Vysocký Ondřej
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

This thesis deals with an effective solution of parallel writing of variable amounts of data on the Lustre file system. The work will be used by the k-Wave project designed for time domain acoustic and ultrasound simulations. Since the simulation is computationally and data intensive, the project requires to be implemented with libraries for parallel computig (Open MPI) and large data processing (HDF5) and it must run on a supercomputer. The application is implemented in C and uses previously mentioned libraries. The proper settings of the Lustre file system leads to the peak write bandwith of 2.5 GB/s that corresponds to a speedup factor of 5 compared to the reference settings. The data aggregation improved the write bandwidth by a factor of 3 compared to a naive version. Here, the achieved I/O bandwidth for certain block sizes hits the limits of the Anselm I/O subsytem (3GB/s)

National Repository of Grey Literature

Application instrumentation for performance analysis and tuning with focus on energy efficiency

Author: Bartolini Andrea
Vysocký Ondřej
Říha Lubomír
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

Profiling and tuning of parallel applications is an essential part of HPC. Analysis and elimination of application hot spots can be performed using many available tools, which also provides resource consumption measurements for instrumented parts of the code. Since complex applications show different behavior in each part of the code, it is essential to be able to insert instrumentation to analyse these parts. Because each performance analysis or autotuning tool can bring different insights into an application behavior, it is valuable to analyze and optimize an application using a variety of them. We present our on request inserted shared C/C++ API for the most common open-source HPC performance analysis tools, which simplify the process of the manual instrumentation. Besides manual instrumentation, profiling libraries provide different methods for instrumentation. Of these, the binary patching is the most universal mechanism, and highly improves the user-friendliness and robustness of the tool. We provide an overview of the most commonly used binary patching tools, and describe a workflow for how to use them to implement a binary instrumentation tool for any profiler or autotuner. We have also evaluated the minimum overhead of the manual and binary instrumentation.Web of Scienc

DSpace at VSB Technical University of Ostrava

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

DGX-A100 face to face DGX-2-performance, power and thermal behavior evaluation

Author: Jansík Branislav
Vysocký Ondřej
Říha Lubomír
Špeťko Matej
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Nvidia is a leading producer of GPUs for high-performance computing and artificial intelligence, bringing top performance and energy-efficiency. We present performance, power consumption, and thermal behavior analysis of the new Nvidia DGX-A100 server equipped with eight A100 Ampere microarchitecture GPUs. The results are compared against the previous generation of the server, Nvidia DGX-2, based on Tesla V100 GPUs. We developed a synthetic benchmark to measure the raw performance of floating-point computing units including Tensor Cores. Furthermore, thermal stability was investigated. In addition, Dynamic Frequency and Voltage Scaling (DVFS) analysis was performed to determine the best energy-efficient configuration of the GPUs executing workloads of various arithmetical intensities. Under the energy-optimal configuration the A100 GPU reaches efficiency of 51 GFLOPS/W for double-precision workload and 91 GFLOPS/W for tensor core double precision workload, which makes the A100 the most energy-efficient server accelerator for scientific simulations in the market.Web of Science142art. no. 37

Directory of Open Access Journals

DSpace at VSB Technical University of Ostrava

Evaluation of the HPC applications dynamic behavior in terms of energy consumption

Author: Beseda Martin
Kannan Venkatesh
Lysaght Michael
Vysocký Ondřej
Zapletal Jan
Říha Lubomír
Publication venue: 'Civil-Comp, Ltd.'
Publication date: 01/01/2017
Field of study

This paper introduces the READEX project tuning approach which exploits the dynamic application behavior and its potential for energy savings. The paper is focused on themanual applications evaluation from the energy consumption optimisation point of view. As an examples we have selected one complex application, the ESPRESO library and two simplified applications from the ProxyApps benchmark tool suite. ESPRESO containsmany types of operations including I/O, communication, sparse BLAS and dense BLAS. The results show that static savings are 5.6–12.3% and dynamic savings are 4.7–9.1%. The highest total savings for ESPRESO are 21.4% as a combination of 12.3% static savings and 9.1% dynamic savings. The ProxyApp applications, Kripke and Lulesh, were presented for two configurations each. The first configuration of the Kripke saved 29.3% energy, almost only by static tuning. On the other hand, the second configuration shows us only 18.8% savings, but over a third of it was saved by dynamic switching CPU core and uncore frequencies. The Lulesh test cases saved 28.9%, respectively 26.7%.Web of Scienc

DSpace at VSB Technical University of Ostrava

A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support

Author: Brzobohatý Tomáš
Kozubek Tomáš
Markopoulos Alexandros
Meca Ondřej
Merta Michal
Vavřík Radim
Vondrák Vít
Vysocký Ondřej
Říha Lubomír
Publication venue: 'SAGE Publications'
Publication date: 01/01/2018
Field of study

In this article, we present the ExaScale PaRallel finite element tearing and interconnecting SOlver (ESPRESO) finite element method (FEM) library, which includes an FEM toolbox with interfaces to professional and open-source simulation tools, and a massively parallel hybrid total finite element tearing and interconnecting (HTFETI) solver which can fully utilize the Oak Ridge Leadership Computing Facility Titan supercomputer and achieve superlinear scaling. This article presents several new techniques for finite element tearing and interconnecting (FETI) solvers designed for efficient utilization of supercomputers with a focus on (i) performance—we present a fivefold reduction of solver runtime for the Laplace equation by redesigning the FETI solver and offloading the key workload to the accelerator. We compare Intel Xeon Phi 7120p and Tesla K80 and P100 accelerators to Intel Xeon E5-2680v3 and Xeon Phi 7210 central processing units; and (ii) memory efficiency—we present two techniques which increase the efficiency of the HTFETI solver 1.8 times and push the limits of the largest possible problem ESPRESO that can solve from 124 to 223 billion unknowns for problems with unstructured meshes. Finally, we show that by dynamically tuning hardware parameters, we can reduce energy consumption by up to 33%

DSpace at VSB Technical University of Ostrava